-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write microbatch compiled + run code to separate target files #10743
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
This comment was marked as outdated.
This comment was marked as outdated.
with patch_microbatch_end_time("2020-01-03 13:57:00"): | ||
run_dbt(["run", "--event-time-start", "2020-01-01"]) | ||
|
||
# Compiled paths - compiled model without filter only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion with @graciegoheen, we decided it could still be useful to see the non-batched (no filters applied) model file, and at the top-level is where users would expect it. I've added a test to formalize this expected behaviour, but it's something that would be easy to change if we get beta feedback!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thanks for doing this work. Found one aesthetic thing, but not blocking.
core/dbt/contracts/graph/nodes.py
Outdated
def format_batch_start(self, batch_start: Optional[datetime]) -> Optional[str]: | ||
if batch_start is None: | ||
return batch_start | ||
|
||
return str( | ||
batch_start.date() | ||
if (batch_start and self.config.batch_size != BatchSize.hour) | ||
else batch_start | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
Feels somewhat weird that this is on the ParsedNode
class instead of the MicrobatchBuilder class. Just feels out of place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally agree! I think at first I didn't do this because it's necessary in providers.py which didn't Added it as a static method. Good call 👍
Resolves #10714
Problem
During execution of microbatch models, each batch is re-compiled with batch-level time filters. Currently, this results in the compiled/run .sql files becoming clobbered on each write when there should be one file per compiled/run batch.
Solution
Pass an additional argument,
split_suffix
(naming suggestions welcome!) that splits the write path by the suffix and writes to a top-level directory without the suffix.Checklist